87 research outputs found

    Segmentation Based Mesh Denoising

    Full text link
    Feature-preserving mesh denoising has received noticeable attention recently. Many methods often design great weighting for anisotropic surfaces and small weighting for isotropic surfaces, to preserve sharp features. However, they often disregard the fact that small weights still pose negative impacts to the denoising outcomes. Furthermore, it may increase the difficulty in parameter tuning, especially for users without any background knowledge. In this paper, we propose a novel clustering method for mesh denoising, which can avoid the disturbance of anisotropic information and be easily embedded into commonly-used mesh denoising frameworks. Extensive experiments have been conducted to validate our method, and demonstrate that it can enhance the denoising results of some existing methods remarkably both visually and quantitatively. It also largely relaxes the parameter tuning procedure for users, in terms of increasing stability for existing mesh denoising methods

    In-Place Gestures Classification via Long-term Memory Augmented Network

    Full text link
    In-place gesture-based virtual locomotion techniques enable users to control their viewpoint and intuitively move in the 3D virtual environment. A key research problem is to accurately and quickly recognize in-place gestures, since they can trigger specific movements of virtual viewpoints and enhance user experience. However, to achieve real-time experience, only short-term sensor sequence data (up to about 300ms, 6 to 10 frames) can be taken as input, which actually affects the classification performance due to limited spatio-temporal information. In this paper, we propose a novel long-term memory augmented network for in-place gestures classification. It takes as input both short-term gesture sequence samples and their corresponding long-term sequence samples that provide extra relevant spatio-temporal information in the training phase. We store long-term sequence features with an external memory queue. In addition, we design a memory augmented loss to help cluster features of the same class and push apart features from different classes, thus enabling our memory queue to memorize more relevant long-term sequence features. In the inference phase, we input only short-term sequence samples to recall the stored features accordingly, and fuse them together to predict the gesture class. We create a large-scale in-place gestures dataset from 25 participants with 11 gestures. Our method achieves a promising accuracy of 95.1% with a latency of 192ms, and an accuracy of 97.3% with a latency of 312ms, and is demonstrated to be superior to recent in-place gesture classification techniques. User study also validates our approach. Our source code and dataset will be made available to the community.Comment: This paper is accepted to IEEE ISMAR202

    Masked Autoencoders in 3D Point Cloud Representation Learning

    Full text link
    Transformer-based Self-supervised Representation Learning methods learn generic features from unlabeled datasets for providing useful network initialization parameters for downstream tasks. Recently, self-supervised learning based upon masking local surface patches for 3D point cloud data has been under-explored. In this paper, we propose masked Autoencoders in 3D point cloud representation learning (abbreviated as MAE3D), a novel autoencoding paradigm for self-supervised learning. We first split the input point cloud into patches and mask a portion of them, then use our Patch Embedding Module to extract the features of unmasked patches. Secondly, we employ patch-wise MAE3D Transformers to learn both local features of point cloud patches and high-level contextual relationships between patches and complete the latent representations of masked patches. We use our Point Cloud Reconstruction Module with multi-task loss to complete the incomplete point cloud as a result. We conduct self-supervised pre-training on ShapeNet55 with the point cloud completion pre-text task and fine-tune the pre-trained model on ModelNet40 and ScanObjectNN (PB\_T50\_RS, the hardest variant). Comprehensive experiments demonstrate that the local features extracted by our MAE3D from point cloud patches are beneficial for downstream classification tasks, soundly outperforming state-of-the-art methods (93.4%93.4\% and 86.2%86.2\% classification accuracy, respectively).Comment: Accepted to IEEE Transactions on Multimedi

    Don't worry about mistakes! Glass Segmentation Network via Mistake Correction

    Full text link
    Recall one time when we were in an unfamiliar mall. We might mistakenly think that there exists or does not exist a piece of glass in front of us. Such mistakes will remind us to walk more safely and freely at the same or a similar place next time. To absorb the human mistake correction wisdom, we propose a novel glass segmentation network to detect transparent glass, dubbed GlassSegNet. Motivated by this human behavior, GlassSegNet utilizes two key stages: the identification stage (IS) and the correction stage (CS). The IS is designed to simulate the detection procedure of human recognition for identifying transparent glass by global context and edge information. The CS then progressively refines the coarse prediction by correcting mistake regions based on gained experience. Extensive experiments show clear improvements of our GlassSegNet over thirty-four state-of-the-art methods on three benchmark datasets
    • …
    corecore